Hanson–Wright inequality in Hilbert spaces with application to $K$-means clustering for non-Euclidean data
نویسندگان
چکیده
We derive a dimension-free Hanson–Wright inequality for quadratic forms of independent sub-gaussian random variables in separable Hilbert space. Our is an infinite-dimensional generalization the classical finite-dimensional Euclidean vectors. illustrate application to generalized $K$-means clustering problem non-Euclidean data. Specifically, we establish exponential rate convergence semidefinite relaxation $K$-means, which together with simple rounding algorithm imply exact recovery true structure.
منابع مشابه
A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کاملParallel K-Means Clustering with Triangle Inequality
Clustering divides data objects into groups to minimize the variation within each group. This technique is widely used in data mining and other areas of computer science. K-means is a partitional clustering algorithm that produces a fixed number of clusters through an iterative process. The relative simplicity and obvious data parallelism of the K-means algorithm make it an excellent candidate ...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملClustering Stable Instances of Euclidean k-means
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd’s algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to desi...
متن کاملSpatial Analysis in curved spaces with Non-Euclidean Geometry
The ultimate goal of spatial information, both as part of technology and as science, is to answer questions and issues related to space, place, and location. Therefore, geometry is widely used for description, storage, and analysis. Undoubtedly, one of the most essential features of spatial information is geometric features, and one of the most obvious types of analysis is the geometric type an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bernoulli
سال: 2021
ISSN: ['1573-9759', '1350-7265']
DOI: https://doi.org/10.3150/20-bej1251